import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from umap import UMAP
import seaborn as sns
import plotly.express as px
import pandas as pd
import plotly.graph_objs as go
import warnings
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=UserWarning)
The dataset used is an image dataset used for microscopy image classification. The dataset is acquired from a previous course, AI in Life Sciences. The dataset consists of 9632 training images with 9 classes. However, only 7890 images are used for training and the rest is used for testing. The goal is to analyze the learning process of the model and to visualize the features. The Deep Learning (DL) training algorithms and its influence on the explainability of neural network models are investigated. The overall aim is to visualize flow of information within the deep NN using factors that can be interpreted by humans, even if the underlying model uses more complex factors, which enables generation of human interpretable explanations.
In this data, we visualize the learning process of the model. The inter-epoch trajectory of the model learning is the main focus. Initially, the network has random weights and learns features from data. As the epochs proceed, the model learns and is able to use the learned features and use the updated weights to better the classification results. The model hidden layer features are visualized and should be seen with better and more obvious formation of clusters.
To get a better understand how the features.npz file was created, please refer to the Train_Model.ipynb notebook. In general, the features.npz file contains the features of the hidden layer of the model. The features are extracted from the model after each epoch. The features are then saved in the features.npz file. The features.npz file is then used to visualize the learning process of the model.
# load the data
data = np.load(r"./data/features.npz")["arr_0"]
labels = np.load(r"./data/labels.npy")
labels = labels.tolist()
We will already add some meta data to the data. The meta data is the number of classes. The number of classes is 9. The meta data will help us to visualize the data and to better understand the data. We will define uniqe colors for each class. The colors will be used to visualize the data. In the end, the colors are devided by 255 to get the colors in the range of 0 to 1.
colors_per_class = {
"A549": [254, 202, 87],
"CACO-2": [255, 107, 107],
"HEK 293": [10, 189, 227],
"HeLa": [255, 159, 243],
"MCF7": [16, 172, 132],
"PC-3": [128, 80, 128],
"RT4": [87, 101, 116],
"U-2 OS": [52, 31, 151],
"U-251 MG": [100, 100, 255],
}
colors = [colors_per_class[label] for label in labels]
colors = np.array(colors) / 255
For the downprojection we used PCA, T-SNE and UMAP. The projection is done on the features of the hidden layer of the model. The projection is just to get a better understanding of the data and also which of the projection methods is the best for the data. For now the projection was done in standard settings.
# plot the data
pca = PCA(n_components=2)
fig, ax = plt.subplots(2, 5, figsize=(15, 10))
for i in range(2):
for j in range(5):
data_2d = pca.fit_transform(data[i * 5 + j])
ax[i, j].scatter(data_2d[:, 0], data_2d[:, 1], s=1, c=colors)
ax[i, j].set_title("Epoch {}".format(i * 5 + j))
plt.show()
Observations: We can see that PCA did not seprate the ponits the much. The points are still very close to each other. The PCA projection is not very good for the data. However, the sepration in epoch 7 is better than the other epochs. What one can see is the in the first epoch the points are very close to each other. As the epochs proceed, the points are more seprated.
fig, ax = plt.subplots(2, 5, figsize=(15, 10))
tsne = TSNE(n_components=2, verbose=0, n_jobs=-1)
for i in range(2):
for j in range(5):
data_2d = tsne.fit_transform(data[i * 5 + j])
ax[i, j].scatter(data_2d[:, 0], data_2d[:, 1], s=1, c=colors)
ax[i, j].set_title("Epoch {}".format(i * 5 + j))
plt.show()
Observations: With t-SNEwe can see that the points are more seprated than with PCA. Here the results very good. Getting the results with t-SNE which where expected. The points are more seprated and the sepration is better than with PCA. One can clearly see that the points in the first epoch are very close to each other. As the epochs proceed, the points are more seprated.
# plot the data using UMAP
fig, ax = plt.subplots(2, 5, figsize=(15, 10))
umap = UMAP(n_components=2, verbose=0, n_jobs=-1)
for i in range(2):
for j in range(5):
data_2d = umap.fit_transform(data[i * 5 + j])
ax[i, j].scatter(data_2d[:, 0], data_2d[:, 1], s=1, c=colors)
ax[i, j].set_title("Epoch {}".format(i * 5 + j))
plt.show()
Observations: UMAP also worked out very well. The points are more seprated than with PCA and t-SNE. However, ther are some classes not seprated very well in the last epoch compare to t-SNE.
Here the all the epochs where ploted into one big plot.
# plot the data
fig, ax = plt.subplots(figsize=(15, 10))
for i in range(10):
tsne = TSNE(n_components=2, init="pca", verbose=0, n_jobs=-1)
data_2d = tsne.fit_transform(data[i])
ax.scatter(data_2d[:, 0], data_2d[:, 1], s=1, c=colors)
plt.show()
# plot the data
fig, ax = plt.subplots(figsize=(15, 10))
tsne = TSNE(n_components=2, init="pca", verbose=0, n_jobs=-1)
for i in range(10):
data_2d = tsne.fit_transform(data[i])
ax.scatter(data_2d[:, 0], data_2d[:, 1], s=1, label="Epoch {}".format(i))
ax.plot(data_2d[:, 0], data_2d[:, 1], alpha=0.09)
ax.legend()
plt.show()
There is not a lot of infomation read out from thw lot. The is a fix for it later in the notebook. But one can see that there are a lot of points in the middel and for the last epoch there are a lot of points in the corners.
To finde the best metrice for the t-SNE we run over three different metrics. The metrics are euclidean, manhattan and hamming. The results are shown in the following plots. The results from euclidean and manhattan are not very different. The best results are with the euclidean metric. The results with the hamming metric are very bad.
TSNE_METRICS = ["euclidean", "manhattan", "hamming"]
for metric in TSNE_METRICS:
fig, ax = plt.subplots(2, 5, figsize=(15, 10))
for i in range(2):
for j in range(5):
tsne = TSNE(n_components=2, init="pca", verbose=0, n_jobs=-1, metric=metric)
data_2d = tsne.fit_transform(data[i * 5 + j])
ax[i, j].scatter(data_2d[:, 0], data_2d[:, 1], s=1, c=colors)
ax[i, j].set_title("Epoch {}".format(i * 5 + j))
fig.suptitle("t-SNE with metric {}".format(metric))
plt.show()
To finde the best metrice for the UMAP we run over 4 different metrics. The metrics are euclidean, hamming, manhattan and correlation.
UMAP_METRICS = [
"euclidean",
"hamming",
"manhattan",
"correlation",
]
for metric in UMAP_METRICS:
fig, ax = plt.subplots(2, 5, figsize=(15, 10))
umap = UMAP(n_components=2, metric=metric, verbose=0, n_jobs=-1)
for i in range(2):
for j in range(5):
data_2d = umap.fit_transform(data[i * 5 + j])
ax[i, j].scatter(data_2d[:, 0], data_2d[:, 1], s=1, c=colors)
ax[i, j].set_title("Epoch {}".format(i * 5 + j, metric))
fig.suptitle("UMAP with metric {}".format(metric))
plt.show()
Observations: As one can see that almost all the metrics give the same results. The good results are with the euclidean, manhattan, correlation metric. The results with the hamming metric are very bad.
Till now we where using the 3d numpy array to plot the data. Now we will bring the data in the table format. For that we will over every epoch and every image and save the hidden weights which are 1280 in the table. The table will have 1280 columns and the number of rows will be the number of images.
def generate_tabel_data(data):
tabel_data = []
for epoch in range(data.shape[0]):
for image in range(data.shape[1]):
tabel_data.append([epoch, image, *data[epoch, image]])
return tabel_data
tabel_data = generate_tabel_data(data)
import pandas as pd
df = pd.DataFrame(
tabel_data,
columns=["epoch", "image", *["x_{}".format(i) for i in range(data.shape[2])]],
)
df["label"] = labels * 10
df
| epoch | image | x_0 | x_1 | x_2 | x_3 | x_4 | x_5 | x_6 | x_7 | ... | x_1271 | x_1272 | x_1273 | x_1274 | x_1275 | x_1276 | x_1277 | x_1278 | x_1279 | label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | -0.061640 | 0.669427 | -0.037390 | -0.072912 | 0.177839 | -0.117206 | 0.195436 | 0.039985 | ... | 0.015348 | 0.738867 | 0.462114 | -0.096684 | -0.084439 | 0.376865 | 0.006146 | 0.499843 | 0.132847 | U-2 OS |
| 1 | 0 | 1 | -0.174966 | -0.015820 | -0.057514 | 0.045858 | 0.007711 | 0.070376 | 0.135525 | 1.686104 | ... | 0.009815 | -0.129545 | -0.144017 | -0.123344 | 1.071622 | 0.496978 | -0.097275 | -0.057270 | -0.029485 | CACO-2 |
| 2 | 0 | 2 | -0.143236 | 0.192292 | -0.024325 | 0.146251 | -0.117585 | -0.097257 | -0.029092 | 0.468650 | ... | -0.003601 | 0.068411 | -0.039624 | 0.016348 | 0.345940 | 0.581720 | -0.153233 | -0.116875 | -0.026813 | HeLa |
| 3 | 0 | 3 | 0.172030 | 1.869631 | 0.315535 | -0.115844 | 0.644933 | -0.177177 | -0.084760 | -0.005555 | ... | -0.004633 | 0.210583 | -0.106762 | 0.399959 | 0.495958 | 0.392284 | 0.268443 | 0.297449 | -0.048701 | CACO-2 |
| 4 | 0 | 4 | -0.129526 | 0.141332 | -0.056453 | 0.363036 | -0.175806 | -0.139477 | -0.146133 | -0.091358 | ... | 1.038976 | 0.721481 | -0.086710 | -0.083307 | 0.307861 | 0.017224 | -0.127287 | 0.325065 | -0.097920 | MCF7 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 17415 | 9 | 1737 | -0.145046 | -0.128737 | -0.112647 | 0.142011 | 1.271666 | -0.239937 | -0.147267 | 0.206571 | ... | 0.059964 | -0.173431 | -0.154945 | -0.042000 | -0.135171 | -0.009146 | 0.363955 | -0.188163 | -0.091321 | PC-3 |
| 17416 | 9 | 1738 | 2.309876 | -0.107943 | 0.096737 | 0.388151 | -0.094068 | 0.135940 | 1.428808 | 0.396734 | ... | -0.114613 | 0.117047 | 0.073088 | -0.030453 | 0.216468 | 0.138633 | 1.105582 | -0.129447 | 1.259154 | HEK 293 |
| 17417 | 9 | 1739 | -0.109858 | -0.120113 | 2.313648 | 1.052860 | 0.306034 | 0.665497 | -0.037204 | 0.915989 | ... | 0.494094 | -0.126748 | -0.015180 | -0.131770 | 0.307820 | -0.096703 | -0.070606 | -0.128338 | 0.676635 | RT4 |
| 17418 | 9 | 1740 | -0.166764 | -0.079054 | -0.100959 | -0.092999 | 0.723891 | -0.145387 | 0.248314 | 0.729864 | ... | 0.588329 | 1.497055 | -0.054499 | 0.220291 | -0.073841 | -0.063293 | -0.061063 | -0.122453 | -0.083355 | PC-3 |
| 17419 | 9 | 1741 | -0.180064 | 0.002528 | -0.110202 | 0.128322 | 1.424868 | -0.206283 | -0.096041 | 0.099965 | ... | 0.156708 | 0.045574 | -0.079024 | 0.390233 | -0.107368 | -0.099926 | -0.159344 | -0.133758 | -0.077656 | PC-3 |
17420 rows × 1283 columns
sns.set(rc={"figure.figsize": (20, 15)})
fig, ax = plt.subplots(2, 5, figsize=(20, 15))
for i in range(2):
for j in range(5):
sns.scatterplot(
x="x",
y="y",
hue="label",
data=df[df["epoch"] == i * 5 + j],
legend="full",
ax=ax[i, j],
)
ax[i, j].set_title("Epoch {}".format(i * 5 + j))
plt.show()
Observations: From the above plots, starting from epoch 5, the model features are obviously moved into clusters which indicates the model converges and its ability to generalize on the Unseen dataset
tsne = UMAP(n_components=2, verbose=0, n_jobs=-1)
data_2d = tsne.fit_transform(df.drop(["epoch", "image", "label"], axis=1))
df["x"] = data_2d[:, 0]
df["y"] = data_2d[:, 1]
fig = px.scatter(
df, x="x", y="y", color="label", title="t-SNE with metric {}".format(metric)
)
fig.show()
pca = PCA(n_components=2)
data_2d = pca.fit_transform(df.drop(["epoch", "image", "label"], axis=1))
df["x_pca"] = data_2d[:, 0]
df["y_pca"] = data_2d[:, 1]
df["dp_method"] = "pca"
TSNE_METRICS = ["euclidean", "manhattan"]
for metric in TSNE_METRICS:
tsne = TSNE(n_components=2, verbose=0, n_jobs=-1, metric=metric)
data_2d = tsne.fit_transform(df.drop(["epoch", "image", "label", "x_pca","y_pca"], axis=1))
df["x_tsne_{}".format(metric)] = data_2d[:, 0]
df["y_tsne_{}".format(metric)] = data_2d[:, 1]
UMAP_METRICS = [
"euclidean",
"hamming",
"manhattan",
"correlation",
]
for metric in UMAP_METRICS:
umap = UMAP(n_components=2, verbose=0, n_jobs=-1, metric=metric)
data_2d = umap.fit_transform(df.drop(["epoch", "image", "label", "x_pca","y_pca", "x_tsne_euclidean", "y_tsne_euclidean", "x_tsne_manhattan", "y_tsne_manhattan"], axis=1))
df["x_umap_{}".format(metric)] = data_2d[:, 0]
df["y_umap_{}".format(metric)] = data_2d[:, 1]
df
| epoch | image | x_0 | x_1 | x_2 | x_3 | x_4 | x_5 | x_6 | x_7 | ... | x_tsne_manhattan | y_tsne_manhattan | x_umap_euclidean | y_umap_euclidean | x_umap_hamming | y_umap_hamming | x_umap_manhattan | y_umap_manhattan | x_umap_correlation | y_umap_correlation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | -0.061640 | 0.669427 | -0.037390 | -0.072912 | 0.177839 | -0.117206 | 0.195436 | 0.039985 | ... | -10.051302 | 21.774466 | 8.006492 | 6.078372 | 9.615277 | 2.454584 | 1.377528 | 8.469635 | 5.849822 | 2.634155 |
| 1 | 0 | 1 | -0.174966 | -0.015820 | -0.057514 | 0.045858 | 0.007711 | 0.070376 | 0.135525 | 1.686104 | ... | -29.252380 | -2.116282 | 8.482591 | 5.685714 | 10.234534 | 3.452316 | 0.627635 | 8.200347 | 6.781205 | 1.870595 |
| 2 | 0 | 2 | -0.143236 | 0.192292 | -0.024325 | 0.146251 | -0.117585 | -0.097257 | -0.029092 | 0.468650 | ... | -0.650872 | 25.010033 | 8.575643 | 6.443521 | 10.611340 | 2.625251 | 0.978865 | 8.912443 | 6.134972 | 2.466205 |
| 3 | 0 | 3 | 0.172030 | 1.869631 | 0.315535 | -0.115844 | 0.644933 | -0.177177 | -0.084760 | -0.005555 | ... | -12.221191 | 13.205143 | 7.843828 | 6.295854 | 8.878566 | 2.462410 | 1.528608 | 8.757740 | 6.337809 | 3.016420 |
| 4 | 0 | 4 | -0.129526 | 0.141332 | -0.056453 | 0.363036 | -0.175806 | -0.139477 | -0.146133 | -0.091358 | ... | -20.844439 | 34.261242 | 8.581761 | 5.341557 | 10.086728 | 4.175112 | 0.822732 | 7.705692 | 6.097578 | 1.592030 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 17415 | 9 | 1737 | -0.145046 | -0.128737 | -0.112647 | 0.142011 | 1.271666 | -0.239937 | -0.147267 | 0.206571 | ... | 18.987158 | 88.912521 | 14.005446 | 4.747704 | 14.364185 | 5.400997 | -4.578933 | 7.493397 | 6.005784 | -2.288573 |
| 17416 | 9 | 1738 | 2.309876 | -0.107943 | 0.096737 | 0.388151 | -0.094068 | 0.135940 | 1.428808 | 0.396734 | ... | -25.133770 | -47.618355 | 3.087091 | 8.515288 | 6.055699 | 0.040828 | 7.544350 | 10.324475 | 6.698108 | 11.865839 |
| 17417 | 9 | 1739 | -0.109858 | -0.120113 | 2.313648 | 1.052860 | 0.306034 | 0.665497 | -0.037204 | 0.915989 | ... | 47.409962 | -34.578732 | 0.629738 | 4.521232 | -0.554824 | 2.005620 | 10.426287 | 7.917491 | 7.289971 | 14.011857 |
| 17418 | 9 | 1740 | -0.166764 | -0.079054 | -0.100959 | -0.092999 | 0.723891 | -0.145387 | 0.248314 | 0.729864 | ... | -4.563132 | 75.327049 | 11.351472 | 4.740134 | 12.162279 | 4.815999 | -2.085558 | 6.704915 | 8.612834 | -1.724663 |
| 17419 | 9 | 1741 | -0.180064 | 0.002528 | -0.110202 | 0.128322 | 1.424868 | -0.206283 | -0.096041 | 0.099965 | ... | 12.599486 | 78.174316 | 13.844024 | 5.231505 | 13.941362 | 4.533796 | -4.492154 | 8.145507 | 6.417513 | -2.590778 |
17420 rows × 1297 columns
# save the dataframe to a csv file
# df.to_csv("data.csv", index=False)
# load the dataframe from the csv file
# df = pd.read_csv(r"data.csv")
fig = px.scatter(
df,
x="x_pca",
y="y_pca",
color="label",
animation_frame="epoch",
animation_group="image",
range_x=[-15, 25],
range_y=[-15, 25],
title="Animation of PCA over epochs",
)
# change size of the plot larger
fig.update_layout(
width=1000,
height=1000,
)
fig.show()
Observations: PCA is a linear dimension reduction technique that seeks to maximize variance and preserves large pairwise distances. This was seen in our dataset when different classes ended up far apart. However, this way of reducing dimensionality may lead to poor visualization when dealing with non-linear manifold structures; thus, other dimensionality reduction methods were investigated.
fig = px.scatter(
df,
x="x_tsne_euclidean",
y="y_tsne_euclidean",
color="label",
animation_frame="epoch",
animation_group="image",
range_x=[-120, 120],
range_y=[-120, 120],
title="Animation of t-SNE with matric euclidean over epochs",
)
# change size of the plot larger
fig.update_layout(
width=1000,
height=1000,
)
fig.show()
fig = px.scatter(
df,
x="x_tsne_manhattan",
y="y_tsne_manhattan",
color="label",
animation_frame="epoch",
animation_group="image",
range_x=[-120, 120],
range_y=[-120, 120],
title="Animation of t-SNE with matric manhattan over epochs",
)
# change size of the plot larger
fig.update_layout(
width=1000,
height=1000,
)
fig.show()
Observations: t-SNE differs from PCA by preserving only small pairwise distances or local similarities whereas PCA is concerned with preserving large pairwise distances to maximize variance. This is seen in our plots with the clusters clearly further away than in PCA.
fig = px.scatter(
df,
x="x_umap_euclidean",
y="y_umap_euclidean",
color="label",
animation_frame="epoch",
animation_group="image",
range_x=[-10, 20],
range_y=[-5, 20],
title="Animation of UMAP with matric euclidean over epochs",
)
# change size of the plot larger
fig.update_layout(
width=1000,
height=1000,
)
fig.show()
fig = px.scatter(
df,
x="x_umap_hamming",
y="y_umap_hamming",
color="label",
animation_frame="epoch",
animation_group="image",
range_x=[-5, 20],
range_y=[-5, 10],
title="Animation of UMAP with matric hamming over epochs",
)
# change size of the plot larger
fig.update_layout(
width=1000,
height=1000,
)
fig.show()
fig = px.scatter(
df,
x="x_umap_correlation",
y="y_umap_correlation",
color="label",
animation_frame="epoch",
animation_group="image",
range_x=[-10, 20],
range_y=[-15, 25],
title="Animation of UMAP with matric correlation over epochs",
)
# change size of the plot larger
fig.update_layout(
width=1000,
height=1000,
)
fig.show()
Observations: UMAP is another investigated dimension reduction technique that can be used for visualization similarly to t-SNE, but also for general non-linear dimension reduction. The manifold was modeled with a fuzzy topological structure. The UMAP algorithm is competitive with t-SNE for visualization quality, and preserves more of the global structure with superior run time performance.
fig = go.Figure()
fig.add_trace(
go.Scatter(
x=df[df["epoch"] == 0]["x_tsne_manhattan"],
y=df[df["epoch"] == 0]["y_tsne_manhattan"],
mode="markers",
name="epoch 0",
marker=dict(
size=6,
color="red",
symbol="circle",
),
)
)
fig.add_trace(
go.Scatter(
x=df[df["epoch"] == 9]["x_tsne_manhattan"],
y=df[df["epoch"] == 9]["y_tsne_manhattan"],
mode="markers",
name="epoch 9",
marker=dict(
size=6,
color="blue",
symbol="square",
),
)
)
for i in range(len(df[df["epoch"] == 0])):
fig.add_trace(
go.Scatter(
x=[
df[df["epoch"] == 0]["x_tsne_manhattan"].iloc[i],
df[df["epoch"] == 9]["x_tsne_manhattan"].iloc[i],
],
y=[
df[df["epoch"] == 0]["y_tsne_manhattan"].iloc[i],
df[df["epoch"] == 9]["y_tsne_manhattan"].iloc[i],
],
mode="lines",
line=dict(width=.5, color="black"),
showlegend=False,
)
)
fig.update_layout(
#label axis
xaxis_title="x_tsne_manhattan",
yaxis_title="y_tsne_manhattan",
width=1000,
height=1000,
legend=dict(
x=0,
y=1,
traceorder="normal",
font=dict(
family="sans-serif",
size=12,
color="black",
),
bgcolor="LightSteelBlue",
bordercolor="Black",
borderwidth=2,
),
)
# set title
fig.update_layout(title_text="t-SNE with connected points")
fig.show()
fig = go.Figure()
for label in df["label"].unique():
fig.add_trace(
go.Scatter(
x=df[df["label"] == label]["x_tsne_manhattan"],
y=df[df["label"] == label]["y_tsne_manhattan"],
mode="lines",
name=label,
line=dict(width=0.3),
)
)
fig.update_layout(
#label axis
xaxis_title="x_tsne_manhattan",
yaxis_title="y_tsne_manhattan",
width=1000,
height=1000,
legend=dict(
x=0,
y=1,
traceorder="normal",
font=dict(
family="sans-serif",
size=12,
color="black",
),
bgcolor="LightSteelBlue",
bordercolor="Black",
borderwidth=2,
),
)
fig.update_layout(title_text="t-SNE with matric manhattan linking the states")
fig.show()
fig = go.Figure()
for label in df["label"].unique():
fig.add_trace(
go.Scatter(
x=df[df["label"] == label]["x_tsne_euclidean"],
y=df[df["label"] == label]["y_tsne_euclidean"],
mode="lines",
name=label,
line=dict(width=0.4),
)
)
fig.update_layout(
#label axis
xaxis_title="x_tsne_euclidean",
yaxis_title="y_tsne_euclidean",
width=1000,
height=1000,
legend=dict(
x=0,
y=1,
traceorder="normal",
font=dict(
family="sans-serif",
size=12,
color="black",
),
bgcolor="LightSteelBlue",
bordercolor="Black",
borderwidth=2,
),
)
fig.update_layout(title_text="t-SNE with matric euclidean linking the states")
fig.show()
fig = go.Figure()
for label in df["label"].unique():
fig.add_trace(
go.Scatter(
x=df[df["label"] == label]["x_umap_euclidean"],
y=df[df["label"] == label]["y_umap_euclidean"],
mode="lines",
name=label,
line=dict(width=0.35),
)
)
fig.update_layout(
#label axis
xaxis_title="x_umap_euclidean",
yaxis_title="y_umap_euclidean",
width=1000,
height=1000,
legend=dict(
x=0,
y=1,
traceorder="normal",
font=dict(
family="sans-serif",
size=12,
color="black",
),
bgcolor="LightSteelBlue",
bordercolor="Black",
),
)
fig.update_layout(title_text="UMAP with matric euclidean linking the states")
fig.show()
Observations: Here we can see which path the points take. In the middle, you see a lot of movement after that they move around a lot more outside.